Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add copy_symlinks keyword to Tar.tree_hash #167

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

mkitti
Copy link
Member

@mkitti mkitti commented Jan 8, 2024

Implement copy_symlinks keyword for Tar.tree_hash

For JuliaLang/Pkg.jl#3643
https://github.com/JuliaBinaryWrappers/P4est_jll.jl/releases/download/P4est-v2.8.1+2/P4est.v2.8.1.x86_64-w64-mingw32-mpi+microsoftmpi.tar.gz

julia> open(GzipDecompressorStream, "P4est.v2.8.1.x86_64-w64-mingw32-mpi+microsoftmpi.tar.gz") do io
           Tar.tree_hash(io, copy_symlinks=false)
       end
"89a337ea6f60a4fd58999ab73dea099e41032138"

julia> open(GzipDecompressorStream, "P4est.v2.8.1.x86_64-w64-mingw32-mpi+microsoftmpi.tar.gz") do io
           Tar.tree_hash(io, copy_symlinks=true)
       end
"ed75b82e0dd9b53c4ac4e70376f3e6f330c72767"

For JuliaPackaging/Yggdrasil#7888
https://github.com/JuliaBinaryWrappers/iso_codes_jll.jl/releases/download/iso_codes-v4.11.0+0/iso_codes.v4.11.0.any.tar.gz

julia> open(GzipDecompressorStream, "iso_codes.v4.11.0.any.tar.gz") do io
           Tar.tree_hash(io, copy_symlinks=false)
       end
"71f68a3d55d73f2e15a3969c241fae2349b1feb5"

julia> open(GzipDecompressorStream, "iso_codes.v4.11.0.any.tar.gz") do io
           Tar.tree_hash(io, copy_symlinks=true)
       end
"409d6ac4c02dae43ff4fe576b5c5820d0386fb3f"

Copy link

codecov bot commented Jan 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.64%. Comparing base (6269b5b) to head (0bcb514).
Report is 10 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #167      +/-   ##
==========================================
+ Coverage   97.28%   97.64%   +0.36%     
==========================================
  Files           4        4              
  Lines         810      851      +41     
==========================================
+ Hits          788      831      +43     
+ Misses         22       20       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mkitti mkitti marked this pull request as ready for review January 8, 2024 10:51
@nhz2
Copy link
Member

nhz2 commented Jan 13, 2024

Can the julia compat and CI tests be bumped to 1.6?

Tar.tree_hash(hdr->false, tar, algorithm="git-sha256", copy_symlinks=true)
end
NON_STDLIB_TESTS && begin
iso_codes_tarball = Downloads.download("https://github.com/JuliaBinaryWrappers/iso_codes_jll.jl/releases/download/iso_codes-v4.11.0+0/iso_codes.v4.11.0.any.tar.gz")
Copy link
Member

@nhz2 nhz2 Jan 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test should use a generated test tar file, like in the "copy symlinks" testset below. The tests will be more reliable if they don't require internet access.

@nhz2
Copy link
Member

nhz2 commented Jan 13, 2024

Adding the following test results in a stack overflow.

    tarball, io = mktemp()
    tar_write_link(io, "a", "b/q")
    tar_write_link(io, "b", "a/q")
    close(io)
    Tar.tree_hash(tarball; copy_symlinks=true)
StackOverflowError:
  Stacktrace:
    [1] exec(re::Ptr{Nothing}, subject::String, offset::Int64, options::UInt32, match_data::Ptr{Nothing})
      @ Base.PCRE ./pcre.jl:203
    [2] exec_r_data
      @ Base.PCRE ./pcre.jl:220 [inlined]
    [3] _findnext_re(re::Regex, str::String, idx::Int64, match_data::Ptr{Nothing})
      @ Base ./regex.jl:442
    [4] findnext
      @ Base ./regex.jl:431 [inlined]
    [5] iterate
      @ Base ./strings/util.jl:556 [inlined]
    [6] iterate
      @ Base ./strings/util.jl:555 [inlined]
    [7] _collect(cont::UnitRange{Int64}, itr::Base.SplitIterator{String, Regex}, ::Base.HasEltype, isz::Base.SizeUnknown)
      @ Base ./array.jl:770
    [8] collect
      @ ./array.jl:759 [inlined]
    [9] #split#487
      @ ./strings/util.jl:628 [inlined]
   [10] split
      @ ./strings/util.jl:626 [inlined]
   [11] link_target(paths::Dict{String, Any}, path::String, link::String)
      @ Tar ~/github/Tar.jl/src/extract.jl:175
   [12] link_target(paths::Dict{String, Any}, path::String, link::String) (repeats 13767 times)
      @ Tar ~/github/Tar.jl/src/extract.jl:193
   [13] git_tree_hash(predicate::Function, tar::IOStream, ::Type{SHA.SHA1_CTX}, skip_empty::Bool, copy_symlinks::Bool; buf::Vector{UInt8})
      @ Tar ~/github/Tar.jl/src/extract.jl:264
   [14] git_tree_hash
      @ ~/github/Tar.jl/src/extract.jl:208 [inlined]
   [15] #98
      @ ~/github/Tar.jl/src/Tar.jl:408 [inlined]
   [16] open(f::Tar.var"#98#100"{Bool, Bool, Tar.var"#1#2"}, args::String; kwargs::@Kwargs{lock::Bool})
      @ Base ./io.jl:396
   [17] open_nolock
      @ ~/packages/julias/julia-1.10/share/julia/stdlib/v1.10/ArgTools/src/ArgTools.jl:35 [inlined]
   [18] arg_read
      @ ~/packages/julias/julia-1.10/share/julia/stdlib/v1.10/ArgTools/src/ArgTools.jl:74 [inlined]
   [19] tree_hash(predicate::Function, tarball::String; algorithm::String, skip_empty::Bool, copy_symlinks::Bool)
      @ Tar ~/github/Tar.jl/src/Tar.jl:407
   [20] tree_hash
      @ Tar ~/github/Tar.jl/src/Tar.jl:398 [inlined]
   [21] #tree_hash#102
      @ Tar ~/github/Tar.jl/src/Tar.jl:425 [inlined]
   [22] macro expansion
      @ ~/github/Tar.jl/test/runtests.jl:12 [inlined]
   [23] macro expansion
      @ ~/packages/julias/julia-1.10/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
   [24] top-level scope
      @ ~/github/Tar.jl/test/runtests.jl:4
   [25] include(fname::String)
      @ Base.MainInclude ./client.jl:489
   [26] top-level scope
      @ none:6
   [27] eval
      @ Core ./boot.jl:385 [inlined]
   [28] exec_options(opts::Base.JLOptions)
      @ Base ./client.jl:291
   [29] _start()
      @ Base ./client.jl:552
Test Summary:   | Error  Total  Time
copy symlinks 2 |     1      1  1.6s
ERROR: LoadError: Some tests did not pass: 0 passed, 0 failed, 1 errored, 0 broken.
in expression starting at /home/nathan/github/Tar.jl/test/runtests.jl:3
ERROR: Package Tar errored during testing

@nhz2
Copy link
Member

nhz2 commented Jan 13, 2024

The following test results in a stack overflow from a different function:

    tarball, io = mktemp()
    tar_write_dir(io,  "foo/bar")
    tar_write_link(io, "foo/bar/b", "../a")
    tar_write_link(io, "foo/a", "bar/")
    close(io)
    Tar.tree_hash(tarball; copy_symlinks=true)
StackOverflowError:
  Stacktrace:
       [1] (::Tar.var"#34#44"{Dict{String, Any}, Tar.var"#by#43"})(io::IOBuffer)
         @ Tar ~/github/Tar.jl/src/extract.jl:329
       [2] sprint(::Function; context::Nothing, sizehint::Int64)
         @ Base ./strings/io.jl:114
       [3] sprint
         @ ./strings/io.jl:107 [inlined]
       [4] git_object_hash(emit::Function, kind::String, ::Type{SHA.SHA1_CTX})
         @ Tar ~/github/Tar.jl/src/extract.jl:347
       [5] (::Tar.var"#hash_tree#42"{SHA.SHA1_CTX})(node::Dict{String, Any})
         @ Tar ~/github/Tar.jl/src/extract.jl:328
       [6] (::Tar.var"#34#44"{Dict{String, Any}, Tar.var"#by#43"})(io::IOBuffer)
         @ Tar ~/github/Tar.jl/src/extract.jl:330
  --- the last 5 lines are repeated 6881 more times ---
   [34412] sprint(::Function; context::Nothing, sizehint::Int64)
         @ Base ./strings/io.jl:114
   [34413] sprint
         @ ./strings/io.jl:107 [inlined]
   [34414] git_object_hash(emit::Function, kind::String, ::Type{SHA.SHA1_CTX})
         @ Tar ~/github/Tar.jl/src/extract.jl:347
   [34415] (::Tar.var"#hash_tree#42"{SHA.SHA1_CTX})(node::Dict{String, Any})
         @ Tar ~/github/Tar.jl/src/extract.jl:328
   [34416] git_tree_hash(predicate::Function, tar::IOStream, ::Type{SHA.SHA1_CTX}, skip_empty::Bool, copy_symlinks::Bool; buf::Vector{UInt8})
         @ Tar ~/github/Tar.jl/src/extract.jl:338
   [34417] git_tree_hash
         @ ~/github/Tar.jl/src/extract.jl:209 [inlined]
   [34418] #98
         @ ~/github/Tar.jl/src/Tar.jl:408 [inlined]
   [34419] open(f::Tar.var"#98#100"{Bool, Bool, Tar.var"#1#2"}, args::String; kwargs::@Kwargs{lock::Bool})
         @ Base ./io.jl:396
   [34420] open_nolock
         @ ~/packages/julias/julia-1.10/share/julia/stdlib/v1.10/ArgTools/src/ArgTools.jl:35 [inlined]
   [34421] arg_read
         @ ~/packages/julias/julia-1.10/share/julia/stdlib/v1.10/ArgTools/src/ArgTools.jl:74 [inlined]
   [34422] tree_hash(predicate::Function, tarball::String; algorithm::String, skip_empty::Bool, copy_symlinks::Bool)
         @ Tar ~/github/Tar.jl/src/Tar.jl:407
   [34423] tree_hash
         @ Tar ~/github/Tar.jl/src/Tar.jl:398 [inlined]
   [34424] #tree_hash#102
         @ Tar ~/github/Tar.jl/src/Tar.jl:425 [inlined]
   [34425] macro expansion
         @ ~/github/Tar.jl/test/runtests.jl:13 [inlined]
   [34426] macro expansion
         @ ~/packages/julias/julia-1.10/share/julia/stdlib/v1.10/Test/src/Test.jl:1577 [inlined]
   [34427] top-level scope
         @ ~/github/Tar.jl/test/runtests.jl:6
   [34428] include(fname::String)
         @ Base.MainInclude ./client.jl:489
   [34429] top-level scope
         @ none:6
   [34430] eval
         @ Core ./boot.jl:385 [inlined]
   [34431] exec_options(opts::Base.JLOptions)
         @ Base ./client.jl:291
Test Summary:   | Error  Total   Time
copy symlinks 2 |     1      1  59.1s
ERROR: LoadError: Some tests did not pass: 0 passed, 0 failed, 1 errored, 0 broken.
in expression starting at /home/nathan/github/Tar.jl/test/runtests.jl:3
ERROR: Package Tar errored during testing

<\details>

@mkitti
Copy link
Member Author

mkitti commented Jan 13, 2024

What does extract do in this circumstance?

@nhz2
Copy link
Member

nhz2 commented Jan 14, 2024

extract has a stack overflow in the first case and goes into an infinite loop in the second case.

@mkitti
Copy link
Member Author

mkitti commented Jan 16, 2024

We should determine the correct behavior for extract first. My sense is that it should produce a more intuitive and descriptive error.

@nhz2
Copy link
Member

nhz2 commented Jan 16, 2024

That makes sense, though sometimes extract will ignore symlinks that it cannot copy. This might be the same issue as #77 . Maybe these issues could be resolved while doing the refactoring described in #64

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Jan 16, 2024

Yeah, this is a known issue for extract: #77. I just never got around to fixing it because it seems deeply unlikely in real-world tarballs, but of course, we should handle it correctly. But it's very annoying to fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants